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Abstract. We develop a pseudo-metric analogue of bisimulation for generalized semi- 
Markov processes. The kernel of this pseudo-metric corresponds to bisimulation; thus we 
have extended bisimulation for continuous-time probabilistic processes to a much broader 
class of distributions than exponential distributions. This pseudo-metric gives a useful 
handle on approximate reasoning in the presence of numerical information — such as 
probabilities and time — in the model. 

We give a fixed point characterization of the pseudo-metric. This makes available 
coinductive reasoning principles for reasoning about distances. We demonstrate that our 
approach is insensitive to potentially ad hoc articulations of distance by showing that 
it is intrinsic to an underlying uniformity. We provide a logical characterization of this 
uniformity using a real- valued modal logic. 

We show that several quantitative properties of interest are continuous with respect to 
the pseudo-metric. Thus, if two processes are metrically close, then observable quantitative 
properties of interest are indeed close. 



1. Introduction 

The starting point and conceptual basis for classical investigations in concurrency are 
the notions of equivalence and congruence of processes — when can two processes be con- 
sidered the same and when can they be substituted for each other? Most investigations 
into timed AH92, AD94j and probabilistic concurrent processes are based on equivalences 
of one kind or another, e.g. |CSZ921 lHan94l IHil94l IHMl ISL95I IPLSOOj to name but a few. 

As has been argued before | JS901 IDG JP991 IDG J P04| . this style of reasoning is fragile in 
the sense of being too dependent on the exact numerical values of times and probabilities. 
Previously this had pointed out for probability, but the same remarks apply, mutis mutandis, 
to real time as well. Consider the following two paradigmatic examples: 
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• Consider the probabilistic choice operator: A\ + p A2, which starts Ai with proba- 
bility p and A2 with probability 1 — p. Consider Ai + p + t A2 and Ai + p +2e A2- In 
traditional exact reasoning, the best that one can do is to say that all these three 
processes are inequivalent. Clearly, there is a gradation here: A\ + p + e A2 is closer 
to Ai + p A2 than A\ + P +2e A2 is to A\ + p A2- 

• Consider the delay t ..4 operator that starts A after a delay of t time units. Consider 
delay t+e .^4 and delay t+2e .*4. Again, in exact reasoning, the best that one can 
do is to say that all these three processes are inequivalent. Again, delay t+e .^4 is 
intuitively closer to delay 4 .^4 than delay t+2e ..4 is to delay t .*4. 

In both examples, the intuitive reasoning behind relative distances is supported by calcu- 
lated numerical values of quantitative observables — expectations in the probabilistic case 
and (cumulative) rewards in the timed case. 

The fragility of exact equivalence is particularly unfortunate for two reasons: firstly, 
the timings and probabilities appearing in models should be viewed as numbers with some 
error estimate. Secondly, probability distributions over uncountably many states arise 
in even superficially discrete paradigms such as Generalized semi-Markov processes (e.g. 
see |She87| for a textbook survey), and discrete approximations are used for algorithmic 
purposes BHK99J lHCH + 02] . These approximants do not match the continuous state model 
exactly and force us to think about approximate reasoning principles — e.g. when does it 
suffice to prove a property about an approximant? 

Thus, we really want an "approximate" notion of equality of processes. In the proba- 
bilistic context, Jou and Smolka JS90 propose that the correct formulation of the "near- 
ness" notion is via a metric. Similar reasons motivate the study of Lincoln, Mitchell, 
Mitchell and Scedrov LMMS98 , our previous study of metrics for labelled Markov pro- 
cesses DGJP99 DG JP04T IDG JP02] . the study of the fine structure of these metrics by van 
Breugel and Worrell vBWOlb, vBWOlai and the study of Alfaro, Henzinger and Majumdar 
of metrics for probabilistic games dAHM03 . 

In contrast to these papers, in the present paper we focus on real-time probabilistic 
systems that combine continuous time and probability. We consider generalized semi-Markov 
processes (GSMPs). Semi-Markov processes strictly generalize continuous-time Markov 
chains by permitting general (i.e. non-exponential) probability distributions; GSMPs further 
generalize them by allowing competition between multiple events, each driven by a different 
clock. 

Following the format of the usual definition of bisimulation as a maximum fixed point, 
we define a metric on configurations of a GSMP as a maximum fixed point. This permits us 
to use analogues of traditional coinductive methods to reason about metric distances. For 
example, in exact reasoning, to deduce that two states are equivalent, it suffices to produce 
a bisimulation that relates the states. In our setting, to show that the distance between 
two states is less than e, it suffices to produce a (metric) bisimulation that sets the distance 
between the states to be less than e. 

Viewing metric distance as bisimilarity, we get a definition of bisimulation for GSMPs, 
a class that properly includes CTMCs. In contrast to existing work on bisimulation for 
general probability distributions (e.g. [BG02I lHer02| ) our definition accounts explicitly for 
the change of probability densities over time. 

Secondly, we demonstrate that our study does not rely on any "ad-hoc" construction 
of metric distances. Uniform spaces capture the essential aspects of metric distances by 
axiomatizing the structure needed to capture relative distances - e.g. statements of the form 
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"x is closer to y than to z." A metric determines a uniform space but different metrics can 
yield the same uniform space. Uniform spaces represent more information than topological 
spaces but less than metric spaces, so we are identifying, as precisely as we can, the intrinsic 
meaning of the quantitative information. We present our maximal fixpoint construction as 
a construction on uniform spaces, showing that the numerical values of different metric 
representations are not used in an essential way. In particular, in our setting, it shows that 
the actual numerical values of the discount factors used in the definition of the metric do 
not play any essential role. 

Thirdly, we provide a "logical" characterization of the uniformity using a real-valued 
modal logic. In analogy to traditional completeness results, we prove that the uniformity 
notion induced by the real-valued modal logic coincides with the uniformity induced by the 
metric defined earlier. Our logic is intentionally chosen to prove this completeness result. 
It is not intended to be used as an expressive specification formalism to describe proper- 
ties of interest. Our framework provides an intrinsic characterization of the quantitative 
observables that can be accommodated - functions that are continuous with respect to the 
metric. 

Finally, we illustrate the use of such studies in reasoning by showing that several quanti- 
tative properties of interest are continuous with respect to the metric. Thus, if two processes 
are close in the metric then observable quantitative properties of interest are indeed close. 
For expository purposes, the list considered in this paper includes expected hitting time, 
expected (cumulative and average) rewards. The tools used to establish these results are 
"continuous mapping theorems" from stochastic process theory, and provide a general recipe 
to tackle other observables of interest. 

The rest of this paper is organized as follows. We begin with a review of the model 
of GSMPs in Section |2j We then give a review of the basic ideas from stochastic process 
theory - metrics on probability measures on metric spaces in Section Eland the Skorohod J 2 
metrics on timed traces in Section 0] We discuss timed traces in the context of GSMPs in 
Section |3 We define metric bisimulation in Sectional We discuss interesting quantitative 
observables are continuous functions in Section |H1 We present our construction in terms of 
uniform spaces in Section [7| Finally, we show the completeness of the real- valued modal 
logic in Section 

2. Generalized semi-Markov processes 

GSMPs properly include finite state CTMCs while also permitting general probabil- 
ity distributions. We describe GSMPs informally here following the formal description 
of Whi80 . The key point is that in each state there are possibly several events that can 
be executed. Each event has its own clock - running down at its own rate - and when the 
first one reaches zero that event is selected for execution. Then a probabilistic transition 
determines the final state and any new clocks are set according to given probability dis- 
tributions: defined by conditional density functions. The probability distribution over the 
next states depends only on the current state and the event that has occurred: this is the 
"Markov" in semi-Markov. The clocks are reset according to an arbitrary distribution, not 
necessarily an exponential (memoryless) distribution: hence the "semi". We will consider 
finite-state systems throughout this paper. 

A finite-state GSMP over a set of atomic propositions AP has the following ingredients: 
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(1) A finite set S of states. Each state has an associated finite set of events I(s), each 
with its own clock (we use the same letter for an event and its clock) and a non-zero 
rate for each clock in I(s). A clock in I(s) runs down at the constant rate associated 
with it. 

(2) A labelling function Props : S — > 2 AP that assigns truth values to atomic proposi- 
tions in each state. 

(3) A continuous probability density function f(t; s, i; s' , i'), over time, for each i G I{s), 
for each target state s' and i' G I(s'). This is used to define how clocks are reset 
during a transition. 

(4) For each i G J(s), a probabilistic transition function nextj : S x S — > [0,1]. We 
require J2 s 'eS nex ti(s, s') = 1. 

We use c,c' (resp. f) for vectors of clock values (resp. rates). We use the vector 
operation c — r c t to indicate the clock vector resulting from evolution of each clock under 
its rate for time t. 

Definition 2.1. Let s be a state. A generalized state is of the form (s,c) where c is a 
vector of clock values indexed by i G I(s) that satisfies a uniqueness condition: there is a 
unique clock in I(s) that reaches first. 

We write T((s,c}) for the time required for the first clock (unique by the above defini- 
tion) to reach 0. We use Q for the set of generalized states, and g s , g' s , g\ . . . for generalized 
states. 

We describe the evolution starting in a generalized state (s, c). Each clock in c decreases 
at its associated rate. By the uniqueness condition on generalized states, a unique clock 
reaches first. Let this clock be i G I(s). The distribution on the next states is determined 
by the probabilistic transition function nextj : S x S — > [0, 1]. For each target state s' , 

• The clocks i' G I(s) \ I(s') are discarded. 

• The new clocks i' G I(s') \ [I(s) \ {i}], get new initial time values assigned according 
to the continuous probability density function /(£; s,i; s',i'). 

• The remaining clocks in J(s) n I(s') carry forward their time values from s to s' 
The continuity condition on probability distributions ensures that this informal description 
yields a legitimate Markov kernel |Whi80| . The semantics of a real-time probabilistic pro- 
cess can be described as a discrete-time Markov process on generalized states. For each 
generalized state, we associate a set of sequences of generalized states that arise following 
the prescription of the evolution given above. 

3. PSEUDOMETRICS 

Definition 3.1. A pseudometric m on a state space S is a function S x S — > [0, 1] such 
that: 

m(x, x) = 0, m(x, y) = m(y, x), m(x, z) < m(x, y) + m(x, z) 

A function / : (M, m) — > (M',m') is Lipschitz if (Vx,y) m'(f(x),f(y)) < m(x,y). 
We consider a partial order on pseudometrics on a fixed set of states S. 

Definition 3.2. A4 is the class of pseudometrics on S ordered as: 

mi H 7712 if (Vs,t) mi(s,t) > rri2{s,t). 

Lemma 3.3. (Ad, ^) is a complete lattice. 
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The top element T is the constant function, and the bottom element is the discrete 
metric DGJP02 . Thus, any monotone function F on (A4, ^) has a complete lattice of 
fixed points. 

3.1. Wasserstein metric. The Wasserstein metric is actually a prescription for lifting the 
metric from a given (pseudo)metric space to the space of probability distributions on the 
given space. 

Let (M, m) be a pseudometric space, and let P, Q be probability measures on M. Then, 
W(m)(P,Q) is defined by the solution to the following linear program (h : M — > [0,1] is 
any function): 

W(m){P,Q) = sup J hdP - j hdQ 

subject to : Vs G M. < h(s) < 1 

Vs,s'. \h(s)-h(s')\ < m(s,s'). 

An easy calculation using the linear program shows that the distances on distributions 
satisfies symmetry and the triangle inequality, so we get a pseudometric - written W(m) - 
on distributions. 

By standard results (see Anderson and Nash AN871), this is equivalent to defining 
W(m)(P, Q) as the solution to the following dual linear program (here p is any measure on 
M x M, S and S' are any measurable subsets): 

W(m)(P,Q) = inf J m dp 

subject to : VS.p{S x M) = P{S) 
VS'.p(M x S') = Q(S') 
VS, S'. p(S x S') > 0. 

The Wasserstein construction is monotone on the lattice of pseudometrics. 
Lemma 3.4. m H m' => W(m) ^ W(m') 

Proof. Clearly every solution to the linear program for W(m')(P,Q) is also a solution to 
the linear program for W(m)(P, Q). The result is now immediate. □ 

We discuss some concrete examples to illustrate the distances yielded by this construc- 
tion. Let (M, m) be a 1-bounded metric space, i.e. m(x,y) < 1, for all x, y. Let l x be the 
unit measure concentrated at x, 

Example 3.5. We calculate W{m){l x , l^). The primal linear program — using the func- 
tion h : M — > [0, 1] defined by h{y) = m(x,y) — yields W(m)(l x , l x i) > m(x,x'). The dual 
linear program — using the product measure — yields W(m)(l x , l x r) < m(x, x'). 

Example 3.6. Let P, Q be such that for all measurable U, \P(U) — Q(U)\ < e. For any 
1-bounded function h, \ J hdP — J hdQ\ < e — since for any simple function g with finite 
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range {v\ . . . v n } dominated by h: 

Y, i ViP(9- l (v l ))-ViQ{g- 1 {v l )) 
= ZMP(g- 1 (v l ))-Q(g-Hv i ))} 
^ZlMPig-'^-Qig-Hv,.))} 

(wlog assume P(g l {v-i)) > Q{g l {vi) for exactly vi,...,v k ) 

<YtUPi9- 1 to))-Q(g-H'H)) 

= Pig-Hivi . . . v k }) - Q{g-\{ V1 . . . v k })) 
< e 

So, W{m)(P,Q) < e. 

In 1-bounded metric spaces, the Wasserstein metric is closely related to the Prohorov 
metric, it, which metrizes the topology of weak convergence. We say that P n weakly con- 
verges to P if for all bounded continuous real- valued functions / J fdP n converges to J fdP. 
For any Borel set A, we write A e for {u : 3v E A.d(u, v) < e}. The Prohorov metric 7r(P, Q) 
between two measures is defined by 

inf P(A) < Q(A € ) + e and Q(A) < P{A £ ) + e. 

The connection between the Wasserstein metric and the Prohorov metric, see GSOl] for a 
tutorial presentation of various such relationships, is: 

V*(P,Q) < W(m)(P, Q) < 2vr(P, Q). 
The following lemma is the key tool to approximate continuous probability distributions 
by discrete distributions in a separable metric space. We use the following lemma later with 
the Ui,i > l's being subsets of e-neighborhoods of a point, and the Ui,i > 1 being a finite 
cover, wrt P, for all but e of the space (which will be covered by Uq). 

Lemma 3.7. Let P be a probability measure on (M, m). Let Ui,i = 0, 1, 2, ... n be a finite 
partition of the points of M into measurable sets such that: 

• (V i > 1) [diameter^) < e ] 1 

• P(U ) < e 

Let Xi be such that Xi £ L/j. Define a discrete probability measure Q on (M,m) by: 
Q({xi}) = P(Ui). Then: 

W(m)(P,Q) < 2e 

Proof. Let h : M — > [0, 1] be any function that satisfies Vs, s'. |/i(s) — h(s')\ < m(s, s'). Then 

JhdP-jhdQ = ZiluMP-J^hdQ 

= Eilu.hdP-Hx^PiU,) 

= EifuSH^-Hx^dP 

< I^ldP + Z^oIu.edP 

< P(U ) + JedP 

< 2e 

The fourth inequality follows as < h(x) < 1 and from our assumption on Ui,m(x, Xi) < 
e, and from the constraint on h, h(x) — h(xi) < m(x,Xi) < e. Thus W(m)(P,Q) < 2e. □ 

"^The diameter of a set S is sup{m(x,y) | x,y G S} 
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4. CADLAG FUNCTIONS 

Usually when one defines bisimulation one requires that "each step" of one process 
matches "each step" of the other process. What varies from situation to situation is the 
notion of step and of matching. In the present case there is no notion of atomic step: one 
has to match sequences instead. In the usual cases matching steps and matching sequences 
are equivalent so one works with steps as they are simpler. Here we have no choice: we 
have to work with timed sequences. 

The timed sequences that one works with are functions from [0, oo) to the state space. 
Since we have discrete transitions these functions are not continuous. It turns out that the 
class of functions most often used are the cadlag 2 functions. 

Definition 4.1. Let (M, m) be a pseudometric space. / : [0, oo) — » M is cadlag if for any 

decreasing sequence {t} j. to 

lim /(*) = /(t ) 

t— >to 

and for any increasing sequence {t} | to 

lim f(t) exists 

We write 2?( M ' m )[0, oo) (or Z? m [0,oo), when M is clear from context) for cadlag functions 
with range (M,m). 

These functions have very nice properties: for example, they have at most countably 
many discontinuities. More to the point perhaps, if one fixes an e > 0, then in any bounded 
interval there are at most finitely many jumps higher than e, so all but finitely many jumps 
are small. 

The study of metrics on spaces of these functions was initiated by Skorohod Sko56 ; see 
Whitt's book Whi02 for an expository presentation. Skorohod defined several metrics: we 
use one called the J2 metric. The most naive metric that one can define is the sup metric. 
This fails to capture the convergence properties that one wants: it insists on comparing two 
functions at the exact same points. Skorohod's first metric (the J\ metric) allows one to 
perturb the time axis so that functions which have nearby values at nearby points are close. 
The Ji metric also fails to satisfy certain convergence properties and we use the J2 metric 
defined below, which like the J\ metric, allows one to compare nearby time points. 

Let (M, m) be a metric space. Let | • | be the metric on positive reals R + be defined by 

I • |(r, r ) = \r — r\ 
Definition 4.2 (Skorohod J2 metric). 

Let (M,m) be a metric space. Let f,g be cadlag functions: [0, 00) — > M. J(m)(f,g) is 
defined as: 

max( supinf[max(m(/(t), g(t')), \t — t\)], 
t v 

supinf[max(m(/(t),fir(^)), I* - *'|)] ) 

2 This is an acronym for the French phrase "continue a droite limites a gauche" meaning "continuous on 
the right with left limits." 
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Thus the J2 distance between two functions is the Hausdorff distance between their 
graphs (ie. the set of points (x,f(x))) in the space [0, 00) x M equipped with the metric 
d((x, s), (y, t)) = max(\x — y\,m(s, t)). 

The following lemma is immediate from definitions. 

Lemma 4.3. m\ ■< ni2 => J (mi) ^ J(m%) 

The next lemma is standard, e.g. see Billingsley Bil99 . 
Lemma 4.4 (Skorohod). 

If (M, m) is separable, £> M [0, 00) is a separable space with a countable basis given by 
piecewise constant functions with finitely many discontinuities and finite range contained 
in a basis of M. 

We consider a few examples, to illustrate the metric — see the book by Whitt Whi02 
for a detailed analysis of Skorohod's metrics. The first example shows that jumps/disconti- 
nuities can be matched by nearby jumps. 

Example 4.5. |Whi02| Let {b n } be an increasing sequence that converges to |. Consider: 

h ^ r > \ l,r>b n 
These are depicted in the picture below. 



The sequence {fb n } converges to fi. 

The next example shows that a single jump can be matched by multiple nearby jumps. 



Example 4.6. Whi02 Let {a n ,},{c n } be increasing sequences that converges to | such 
that a n < c n . Let: 



9n(r) 



These are depicted in the picture below. 



0. 

0, 
1. 



r < a n 

a n <r < c n 

c n <r <\ 



The sequence {g n } converges to / 
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The next two non-examples shows that "jumps are detected". In a later section, we de- 
velop a real-valued modal logic that captures the reasoning behind these two non-examples. 
Here, to provide preliminary intuitions, we provide a preview of this development in a spe- 
cialized form. Given a cadlag function / with range [0, 1] and the standard metric, and a 
Lipschitz function h on [0, 1] let C(h)(f) be defined as 

C(h)(f)(t) = sup{M/(i')) " \t' ~ t\ | f G [0, co)} 

V 

In this definition, view h as a test performed on the values taken by the function /. Since 
h is a Lipschitz function on [0,1], the results of such tests are smoothed out, and include 
the analogue of (logical) negation via the operation 1 — (■) and smoothed conditionals via 
h q (x) = max(0, x — q) that correspond to a "greater than q" test. The C(h)(f) also performs 
an extra smoothing operation over time, so that the values of C(h)(f) at times t, t' differ by 
atmost \t-t'\. We can show that if J(m)(f, g) < e, then VTi.Vt \C(h)(f)(t)-C(h)(jg)(t)\ < 2e. 
We will use this to establish non-convergence of function sequences in the J-metric in the 
next two examples. 

The first non-example shows that jumps are detected — a sequence of functions with 
jumps cannot converge to a continuous function. 

Example 4.7. Let {b f n } be an increasing sequence that converges to \. Consider: 

0, r < b' n 
fb>{r)={ 1, b' n <r<\ 



0, \<r 



2 



These are depicted in the picture below. 



b' i 

The sequence {fv n } does not converge to the constant function 0, as J(m)(fy , 0) = 1, 
because the point (b' n , 1) has distance 1 from the graph of 0. Alternatively, this can be 
illustrated by considering the Lipschitz operator h' e on [0, 1] defined as: 

h e (r) = max(0, e — |1 — r\) 

For all n, C(h' e )(g n ){\) = e, but C(ti e )(0)(±) = 0. 

The second non-example shows that continuous functions do not approximate a function 
with jumps. 

Example 4.8. Whi02 Let {e n } be an decreasing sequences that converges to 0. Let 
d n = \ — t^, e n \ + $f. Consider: 



K(r) = 

These are depicted in the picture below. 



0, r < d n 

d n < r < e n 



r-d n 

1, r > e n 
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The sequence {g n } does not converge to fi. The point (1/2, 1/2) is at distance 1/2 from 
the graph of fi, so J(m)(h n , fi ) > 1/2. To analyze in terms of the operator C, consider 
the Lipschitz operator h e on [0, 1] defined as: 

h e (r) = max(0, e — |- — r|) 

For all n, C{h e ){g n ){\) = e, but C{h e )(f,)(\) = 0. 

We conclude this section with a discussion of a delay operator on the space of cadlag 
functions. 

Definition 4.9. Let (M,m) be a metric space. Let / G I)( M > m ) [0, oo). Let s G M, < r. 
Let « : [0, r) — > M be a continuous function. Define delay M (/) G 2?( M ' m )[0, oo) as follows: 



dela y j/)( t ) = <•>• ( |; <r 



The distance between a cadlag function and its t-delayed version is no greater than t. 

Lemma 4.10. Let < r. Let u : [0, r) — > M be continuous such that (V0 < r' < 
r) m(u(r'),/(0)) < r. Then: J(m)(delay tt (/), /) < r. 

Proof. Follows from m(delay u (/)(t), /(t — r)) < r, forall t. □ 

5. GSMPS AND CADLAG FUNCTIONS. 

We deal with the temporal aspects of GSMPs next by constructing cadlag functions for 
paths: i.e. sequences of generalized states of a GSMP. 

A sequence of generalized states is finitely varying if it is non-Zeno, i.e. for any i, 
^J^i 1~((sj, Cj)), the sum of the times spent at each generalized state, diverges. Any finitely 
varying sequence of generalized states (sj,ci) generates / : [0,oo) — > Q as follows: 

• f(t) = (si,c), where 

i i+l 

^ T «^, c ~k)) <t < ^T((s fc ,c k )) 

fc=0 A;=0 

and c = c"i — f c \t — Y2\=o 1~((sk, Ck))\ is the new clock values after evolving at rate 

r~ for time \t - Y,k=o T (( s k, c"k))| starting from (s fe ,c k ). 
Such finitely varying traces satisfy the following: for any interval [t,t f ], that there is a finite 
partition t = to < t\ < ti . . . < t n = t' such that: 

• If f(U) = (s,c), then: (Vti < t < t i+1 )f(t) = (s, c - f c \t - U\). 
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We write Traces(s,c) for the set of traces that start with (s,c). The probability distri- 
butions associated with initial clock- values at states (f s ) and transitions (next) induces a 
probability measure on Traces (s,c). The paths that are Zeno have measure zero, so the 
finitely-varying paths generate Traces(s,c) in measure. 

Arbitrarily close approximations to the distance between finitely-varying functions 
/,§ 6 £>^ ,,n ) [0, oo) are forced by the distances between the values of f,g at finitely many 
points of time. This lemma is useful later on to show that our coinductive definition of 
metric has closure ordinal lu. 

Lemma 5.1. Let m be a pseudometric over generalized states. Let /,g £ D^ ,m )[0,oo) be 
finitely- varying functions such that J(m)(f,g) > 5. 

Then there is a finite subset Qfi n C Q and e > such that for any m', J(m')(f, g) > 5 
if (Vgi,g2 £ Qfin) [m'(gi,g2) > r "(gi 5 g2) - el- 
Proof. If J{m){f,g) > 5, without loss of generality there is a t such that the Hausdorff 
distance of (f(t),t) from Graph(g) is greater than 5 + 7 for some 7 > 0. Consider the 
bounded interval [t — S, t + 6]. By finite- variation of g, we have a partition to = t — 5 < t\ < 
ti . . . < t n = t + 5 such that: 

• (Vt, < t < t i+1 )f(t) = (s,c - r c (t - U)), where f(U) = (s, c) 

• \ti — ti+i\ < 5r 

• If t',t" are in the same partition, then 
m(/(f),/(i"))<i 

Construct Qfi n = {(si, ci), . . . , {s n , c n )} by choosing one {si, c"i) each from g[U-i,ti] Choose 
e= 2- For any ml on generalized states such that: 

(Vi, j)m'((si, (si,cj)) > m((si,c"i}, (s^cj)) - e 

the Hausdorff distance of (f(t),t) from Graph(g) is greater than 5 by construction. □ 

6. BlSIMULATION STYLE DEFINITION OF METRIC 

Let M. be the class of pseudo-metrics on generalized states that satisfy: 

Props(s) / Props(s') =^ m((s, c), (s' , c')) = 1 

where Props(s) is the set of atomic propositions true in state s. We order these pseudo- 
metrics as in section 01 m\ ^ m,2 if (is,t)mi(s,t) > m2(s,i). Fix < k < 1. Define a 
functional Th on M.: 

Definition 6.1. T k (m){(s,c), (s',c'}) < e if 

k x IF(J(m))(Traces(s,c),Traces(s',c / )) < e 

In this definition, view k as a discount factor. In the next section, we will show that 
the choice of k does not affect the essential character of the metric. "Type-checking" of 
this definition provides some intuitions: m is a pseudo-metric on generalized states. J(m), 
following Skorohod J2, is a pseudo-metric on finitely varying sequence of generalized states. 
W(J(m)), following Wasserstein, is a pseudo-metric on probability distributions on finitely 
varying sequence of generalized states. 

As an immediate consequence of lemmas 14.31 and 13.41 

Lemma 6.2. is monotone on Ai. 

Since (M, ^) is a complete lattice, Tk has a maximum fixed point, mjr k . 
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m((s,c), (s',c')) 



Definition 6.3. m is a metric-bisimulation if m ■< J-k(m). 

It is well known that the greatest fixed point of is given by: 
mp k = | |{m | m is a metric-bisimulation} 

Thus, a metric-bisimulation to provides an upper bound on the distances assigned by mj^ k . 

Consider the equivalence relation ~= {((s, c), (s', c')) | mj^ k ((s, c), (s', c')) = 0}. ~ 
describes a notion of bisimulation and is explicitly defined as follows. 

Let M{o,i}, the sublattice of M consisting of metrics whose range is {0,1}, i.e. all 
distances are either or 1. _M{o,i} is essentially the class of equivalence relations. A simple 
proof shows that: 

-= LK m I m G M {o,i}i m ^ F k (m)} 
As an example of metric-reasoning, we now show that generalized states with the same 
state, but clock values reflecting evolution for a time t are m^ k -close. 

Lemma 6.4. Define a pseudo-metric to on generalized states as follows: 

min(l,t), if s = s' and 
c = c' + f c t or 
c = c' — f c t 
1, otherwise 

Then: m ^ Tk(m). 

Proof. It suffices to prove that for all generalized states (s, c), (s' , c') 

J c ' fe (m)((s,c), (s',c')) < m((s,c), (s',c')). 

The only case to consider is when s = s', wlog assume c < c'. 
Define u : [0, t) — > Q as follows. 

u(t) = (s, d - f c t) 

With this definition, it is clear that: 

Traces(s', c') = {delay u (/) | / € Traces(s,c)} 

with the distribution inherited from Traces(s,c). Now this induces a matching between 
the traces in Traces(s,c) and those of Traces(s', c'). This in turn induces a distribution 
on the product space Traces(s,c) x Traces(s', c'). Using this distribution as the p in the 
dual form of the definition we get the result. □ 

Let mo = T and mj+i = ^(mj). The role played by the discount constant k is captured 
in the following fact: 

(V(s,c), (s',c>)) \m n+1 ((s,c), (s',c>)) - m n ((s,c), (s',c'))\ < k n+1 . 

This is the key step in the proof of the following lemma. 

Lemma 6.5. mjr k is separable. 

Proof. We first note that if m,m' are such that (Vs)|m(s) — m'(s)\ < 5, then: 

• (VP, Q) \W(m)(P, Q) — W(m')(P, Q)\ < 5, this is immediate from the dual formu- 
lation 
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• ( v ' f->9)\J{ m )U '><?) ~ J{ m ')(j 'i9)\ — <5> this is immediate from the definition. 
We prove by induction on n that 

(V( S ,c),( S ',c')) \m n+1 ({s,c),{s',c')) - m n ({s,c),{s',c'))\ < k n+1 . 

• Base, n = 0. Follows from the fact that mo is the constant function and mi is 
bounded above by k. 

• Induction. Assume (V(s, c), (s', c'}) \m n+ i((s, c), (s', c'))— m n ((s, c), (s', c'})| < /c n+1 . 
Thus, for any (s, c), (s', c')), W(J(m n +i) and ((s, c), (s', c'}) and V7(J(m n )) differ 
by atmost A: ra+1 . So: 

|m n+2 (( S ,c},( S / ,c'))-m n+1 ((( S ,c),( S ',c'})| < A; n + 2 

Let m = supm,. From above, (V(s,c), (s',c'}) 

|m((s,c), (s',c')) - m n ((s,c), (s',c'))| < :j— ^ 

Thus, an e ball around (s, c) wrt the metric m can be realized as the countable union of 
open sets wrt the metrics m n . The result now follows from the separability of the metrics 

□ 

The separability of rn^ k enables one to prove the analogue of lemma 15.11 
Lemma 6.6 (Finite detectability of distances). 

Let m be a pseudometric on Q with countable basis. Let J r / C (m)((s, c), (s',c'}) > 5. 

Then there is a finite subset Qfi n C Q and e > such that for any metric m' >z m, 
F k (m')((s,c},(s',c>)) > 6 if (Vg s ,g' s G G fin ) [m'(g s ,g' s ) > m(g s , g' s ) - e]. 

Proof. 

Let J r fc (m)((s,c),(s',c / }) > <5 + 7, for 7 > 0. 

From separability of m, lemma FOl yields separability of J(m). Let p be the measure 
induced on the space of all traces by Traces((s, c}). Using separability of J(m), we can get a 
finite partition Uo, U%, . . . ,U n of Q satisfying diameter([/j) < A for i > 1, and p(?7o) < jq- 
Using lemma E3 with gives us a finite set of traces L\ = {fi \ G Ui,i > 0} with 
probabilities j»j = -P(/i) = p(U{). Similarly applying lemma ETT1 to Traces((s', c')) gives 
us another finite set L 2 = {f[ \ i} with probabilities p[ given by the measure induced by 
Traces((s', c')). We then have from lemma ETT1 

• IU(J(m))(Traces((s,c)),Li) < | 

• T^(J(m))(Traces((s',c')),L 2 ) < | 

So, W{J{m)){L x ,L 2 ) >6 + % . 

Using corollary 15.11 for every pair in {/j | i} x {/j | j}, yields a finite set £/jj n C Q 
and e > such that for any m', if (Vg s ,g' s G G fin ) [m'(g s ,g' s ) > m(g s ,g' s ) - e], then 
J(m')(fi, fj) > J(m)(/j, /j) — ^ for all i, j. A simple use of the dual form of the Wasserstein 
metric yields W( J(m'))(Li, L 2 ) > W(J{mj){L\, L 2 ) — ? under these conditions. 

Now since m' >z m, W( J(m'))(Traces((s, £)), L x ) < VU( J(m))(Traces((s, c)), L x ) < |, 
and similarly VF( J(m'))(Traces((s', c')), L 2 ) < ^. The triangle inequality then gives us 
that < F fc (m')((s,c),(s',c')) > 5. □ 
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Lemma 6.7. Tk has closure ordinal uj. 

Proof. The proof proceeds by showing that the maximum fixed point m is given by m = 
Ujmj, where ttt-q = T and mj+i.T-^m.j). 

Let m((s,c), (s',c')) > 5. From lemma EH we deduce the finitely many conditions of 
the form 

m'((si, 6i), (4, c'.}) > m((s, c), (s', c')) - e 

that suffice to ensure that ^ r / c (m / )((s, c), (s', c')) > 5. Each of these finitely many conditions 
are met at a finite index, therefore by oj they are all met and the result follows. □ 

7. Uniform spaces 

A metric captures a quantitative notion of distance or "nearness" , a topology captures 
a qualitiative notion of nearness: a topology is enough to talk about convergence and 
continuity. A topology is, however, not enough to capture a notion of relative distance. 
One cannot say "x is closer to y than it is to z" on the basis of a topology alone. A uniform 
space - see, for example, Ger85] for a quick survey - captures the essence of the relative 
distance notion in metric spaces: if there are points x, y, z such that x is closer to y than 
to z, uniform spaces have enough data to capture this without committing to the actual 
numerical values of the distances. The aim of this section is to show that our treatment is 
"upto uniformity" - this is a formal way of showing that there is no ad-hoc treatment of 
the quantitative metric distances. In particular, we show that different discount factors k 
yield the same uniformity. 

Let S be a set. 

Definition 7.1. A pseudo-uniformity, IA is a collection of subsets of Sx S, called entourages, 
that satisfies: 

• (VE G U) (Vx E S) (x, x)EE 

• EeU^ E~ l EU 

• E eU^ (3E* G U) E'E' C E 

• E,E' eW^ E n E' G U 

• E eU,E QE' E' G U 

One can think of the entourages as defining approximations to the identity relation - 
just as the neighbourhood of a point can be thought of as an approximation of the point. 
The first axiom says this, the second axiom is symmetry and the third is a truncated version 
of transitivity and the final axiom says that a superset of an approximation to the identity is 
also an approximation to the identity. The usual presentation of uniformities also includes 
C\ E& u E = {(x,x) | x G S}, but this condition is not appropriate to our pseudo-metric 
setting. In this paper, we will work with pseudo- uniformities, often dropping "pseudo" and 
merely saying "uniformities". 

Definition 7.2. A pseudo-uniform space is a pair (S,U) where U is a pseudouniformity on 
S. 

There is a natural notion of map between uniform spaces. A morphism between uniform 
spaces generalizes uniformly-continuous functions. 

Definition 7.3. A morphism / between (pseudo) uniform spaces / : (S\,Ui) — > (S2M2) is 
a function / : S\ — ► S2 such that 

^E 2 eU 2 )r l {E 2 )eU 1 
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where / : Si x Si -> S 2 x S 2 is given by /(x,y) = (f(x),f(y)). 

To gain intuition into this definition, we describe how a pseudometric generates a 
pseudo-uniformity. Given a pseudo-metric m on S, let = | m(x,y) < e} for 

e > 0. We get a pseudo-uniformity by considering Ger85 ( p. 218]: 

U = {E C S x S | (3e)i^ C E 1 } 

Thus, if m(x, y) < e and m(a?, z) > e, there is an entourage that contains (x, y) but not 
(x, z). Clearly if one just scales the metric this construction yields the same uniformity. 

Thus, all pseudometrics induce pseudo- uniformities, but the converse is not true: there 
are pseudo-uniformities that are not induced by metrics. Two metrics m, m! on the same 
set induce the same uniformity if and only if the identity map is uniformly continuous in 
both directions. 

7.1. The lattice of uniformities. Consider uniformities induced by pseudo-metrics on a 
fixed set of states S. 

Definition 7.4. VA4U is the class of pseudo-metrizable uniformities {Ui} on S ordered as 
follows. 

Ui < U 2 if U 2 C Ux 

This order is closely related to the order on the lattice of pseudometrics. 

Lemma 7.5. Let pseudometrics m\,m 2 induce U\,U 2 respectively, Let U 2 C U\. Then 
pseudometric m defined as m(s,t) = max(m 2 (s,t),mi(s,t)) also induces U\. 

Proof. Let hi be the uniformity induced by m. We need to show that U = U\. Since 

We now show that U C U\. Consider an entourage E G hi. Thus, there is an e such 
that E D K^, i.e. E D n iT^, 2 . But K e m2 G W 2 and by assumption W 2 C Wi, K^ 2 G Wi. 

So, ^ nif^ 2 eWi. So, £?eWi. " " □ 

Lemma 7.6. (VMU, <) is a complete lattice. 

Proof. The least element is given by the discrete metric: L(s,t) = if s = t, 1 otherwise. 
The top element has only one entourage S and is induced by the pseudometric given by 
(Vs,t)T(s,i) = 0. The greatest lower bounds of {Ui} is given by the U^j. □ 

7.2. Wasserstein uniformity. 

Lemma 7.7. Let (M, m), (M,m') be such that the uniformities induced by m,m' are the 
same. Then, W(m),W(m') induce the same uniformity. 

Proof. Since the uniformities induced by m, m! are the same, the identity map (M, m) — > 
(M, m!) is uniformly continuous. We need to show that the identity map on distributions 
with metrics W(m) and W(m') is uniformly continuous. 

Since the uniformity induced by W{m) (resp. W{m')) is the same as the uniformity 
induced by the Prohorov metric ir(m) (resp. ir(m')), it suffices to prove that the identity 
map on distributions with metrics 7r(m) and 7r(m') is uniformly continuous. 
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Let 7r(m')(P,Q) < e. Let 5 e be such that: m(x,y) < 5 e =4> m'(x,y) < e. Let 7 = 
min(e,(5 e ). Then: 

vr(m)(P,Q)< 7 P(C7) < Q(Cft) +7 

=> P(C/)<Q(C/^)+7 [QM)CQ(^)] 
=► P(CA)<Q(L^,) + e (7<e) 
=> 7r(m')(P,Q) < e 

□ 

In the light of this theorem, we write W(hi) for the uniformity generated by a pseudo- 
metrizable uniformity hi. As a direct consequence of lemma l3~H we have: 

Corollary 7.8. U <W =¥ W{U) < W(W). 

7.3. Skorohod 32 uniformity. A similar result holds for the Skorohod metric. 

Lemma 7.9. The uniformity induced by the Skorohod J2 metric depends only the unifor- 
mity induced by m on M. 

Proof. For each / G T> M [0,oo) consider its graph, Graph(/) = {{f{t),t)}. For a relation R 
and a set X, write R(Y) = {x \ (x,y) £ R,y £ Y}. Similarly, (X)R = {y \ (x,y) G R,x € 
X}. These operations are monotone, under the subset ordering, in both X and R. 

Let the space S = [0, 00) x M be equipped with the metric d((x, s), (y,t)) = max(\x — 
y\,m(s,t)). This metric induces a uniformity U(S) on S. 

Let E C 5 x 5 be an entourage in U{S). Consider the subset of J(E) of T> M [0, 00) x 
P A/ [0,oo) induced by E as follows. J(E) is the set of all (/, g) such that: 

• (Graph(/))£ D Graph( 5 ) 
. £(Graph( 5 )) D Graph(/) 

Consider W = {S \ S 5 J(E),E CSxS,E G f/(<S)}. We will show that W is the uniformity 
generated by J(m). 

Rewriting J(m) in this style: J(m)(f, g) < e 44> 
. (Graph(/))^ m M) D Graph( 5 ) 

* ^K|.|)(Graph( 5 ))DGraph(/) 

Clearly, the uniformity generated by J(m) is a subset of since Kf m ^ is an entourage 
of S x 5, and thus considered in the definition of hi. Furthermore, any arbitrary entourage 
E in U(S) is a superset of these basic entourages, and yields a superset by monotonicity of 
the relational operations. 

The result now follows by the upward-closure axiom in the definition of hi and the 
uniformity generated by J(m). □ 

The proof of this theorem also shows that the construction J(m) yields the same uni- 
formity for other definitions of metrics on M that yield the same uniformity, e.g. the metric 
d((x,s),(y,t))\x -y\ +m(s,t)). 

In light of this theorem, we write J{U) for the uniformity generated by a pseudo- 
metrizable uniformity hi. As a direct consequence of lemma ESI we have: 

Corollary 7.10. Si < S 2 => J(«Si) < J(S 2 ) 
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7.4. A functional on the lattice of uniformities. Combining the above results, we 
deduce the existence of a monotone function T k on the lattice of uniformities. This function 
is insensitive to the actual numerical value of the discount factor k. 

Lemma 7.11. (VO <k,k' < 1) T k = T k >. 

Proof. By Lemmas 17.71 and 17.91 we see that the functional T k is defined upto uniformity. If 
we change k to k' we are simply rescaling the metric, this clearly gives the same uniformity. 

□ 

Furthermore, for any discount factor < k < 1, we get the same maximum fixed point 
in the lattice of uniformities. In contrast to the above lemma, the following theorem relies 
on k ^ 1. 

Theorem 7.12. The maximum fixpoint in (VA4U, <) is the uniformity induced by rnj^ k , 
the maximum fixpoint in {Ad, -<). 

Proof. It suffices to show that the greatest lower bound of the Ui, L)Ui is the uniformity 
induced by suprrij. Let V be the uniformity induced by suprn-j. Since rrii ^ X^ m « 101 an h 
we have that Ui C V for all i. Thus Villi C V and hence the identity function from V to Hi 
is uniformly continuous. 

For the converse, we will show that the identity function from L)Ui — > V is uniformly 
continuous. We know - from the proof of Lemma fo. 51 - that for any generalized states g s 
and g' s 

supmi(g s ,g' s ) - m;(g s ,g' s ) < YZ7%- 
We choose n and 5 such that 5 + < e - Now for any such 5 and n we have 

™n(gs,gs) < 5 supmi(g s ,g' s ) < e. 

This shows that for any m . is contained in a and hence that the identity function 
is uniformly continuous. □ 

This proof relies on the fact that k < 1 otherwise the 5 would not be defined. 

8. Examples 

In this section, we discuss several examples of the use of approximate reasoning tech- 
niques. The general approach in this section is to identify natural quantitative observables, 
already explored in the literature, that are amenable to approximation — i.e. to calculate 
the observable at a state (s,c) upto e, it suffices to calculate it a close-enough state (s',c f ). 
This is clearly implied by continuity of the observable w.r.t. the metric mj^ k . 

The main technical tool that we use to establish continuity of observables is a continuous 
mapping theorem, e.g. see Whi02, She82j for an introductory exposition. 

Theorem 8.1 (Continuous mapping theorem). 

Let P n be a sequence of probability distributions on X that weakly converge to P. Let U 
be a continuous function X — * R. Then J UdP n converges to J U dP. 
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8.1. Expected time to hit a proposition. Let p be a proposition. We consider the 
expected time required to hit a p-state, i.e. a state where the proposition p is true. Define 
Hitp : V mF [0,oo) -» [0,oo]: 

Hit p (/) = inf{i | f(t) = (s,c),p true at s} 

Hitp is a continuous function — if J(mir)(f, g) < e, then |Hit p (/) — Hit p (g)| < e. 

So, using the continuous mapping theorem, we deduce that if {(sj, c"i)} converges to 
(s,c) then the sequence of expected times to hit a p-state from {(sj,c"i)} converges to the 
expected time to hit a p-state from {(s, c)}. In fact, since in this case, Hit p is a 1-Lipschitz 
function, we can also deduce the rate of convergence using |Whi02j — if m,F({si, ci), (s, c)) < 
e, then the expected times to hit a p-state differ by atmost 2e. 

8.2. Expected rewards. Let be an assignment of rewards to states Sj such that if 
ri / rj, then states S{, Sj differ in the truth-assignment of at least one proposition. This 
restriction can be viewed purely as a modelling constraint. 

Define a function R : Q — * [0, 00) by: 

R((si,c) = ri 

Under the hypothesis that distinct rewards are distinguished propositionally, R defines a 
continuous function. 

For any finitely- varying /, consider CumR(/), a continuous function of t defined as 
follows: 

CumR(/)(T) = / R(f(t))dt 
Jo 

By standard results — e.g. see |Whi02j — CumR is a continuous function from T> mF [0, 00) 
to (C, unif) where C is the space of continuous functions from [0, 00) — > [0, 00) with the 
uniform metric: 

unif(/,g) = sup \f(t)-g(t)\ 
t 

Consider the following continuous functions from (C, unif) to [0, 00): 

• For a fixed T, cumulative reward at time T. 

• For a fixed T, average reward per unit time at T. 

• The supremum of the times T at which cumulative reward is less than a fixed v, for 
some value v. 

• The supremum of the times T at which the average reward is less than a fixed v, 
for some value v. 

In each of these cases, by composing with CumR, we get a continuous function from T> mF [0, 00) 
to [0, 00). So, the continuous mapping theorem applies, and we deduce that if {(sj, c"i)} con- 
verges to (s, c) then the sequence of expected values from {(sj, c"i}} converges to the expected 
value at {{s, £)}. 

9. Functional characterization of uniformity 

In an early treatment of metrics for LMPs DGJ P991 IDGJP04| the metric was defined 
through a class of functions closely related to a modal logic. The idea was that, in a 
probabilistic setting, random variables play a role analogous to modal formulas. A class of 
random variables (measurable functions) was defined on the state space and the metric was 
obtained by taking the sup over this class of functions. The coinductive definition of the 
metric came later vBWOlb, DGJP02 and was shown to be the same as the metric defined 
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logically. In the present work we develop the subject along similar lines. We already have 
the fixed-point version of the metric: we now give the "logical" view. In this section, we 
provide an explicit construction of the maximum fixed point by considering a class of [0, 1] 
valued functions. 

9.1. Function expressions. 

Definition 9.1. Fix < k < ^. The syntax of function expressions is given by: 

F k ::= 1 | p | min(F fc , F k ) \hoF k \ I ' G k 

G k ::= C(F k ){t) \ mm(G k (t), G k {t')) \ h o G k (t) 
where p ranges over atomic propositions, h is any Lipschitz function on [0, 1], t € [0, oo). 

The subscript k gives the discount factor. We will not usually write this factor explicitly. 
Intuitively the F-function expressions are evaluated at generalized states, and the G-function 
expressions are evaluated on finitely-varying paths at the times shown. In a temporal logic 
with state and path formulas, like CTL*, the path formulas are implicitly evaluated at the 
first time. This may not seem to be the case with a formula like Gp (\Dp in LTL notation) 
but is clear with a formula like Xp (Qp). In our G-formulas we cannot have a first time: we 
provide the time as an explicit parameter. One can imagine a much richer language of path 
formulas: for example, one might have time averages along a path. However, the present 
language suffices for the definition of the metric. 

As preliminary intuition, 1 corresponds to the formula true, min(-, •) corresponds to 
conjunction 3 , and ho f encompasses both testing (via h(x) = max(s — q, 0)) and negation 
(via h(x) = 1 — x). At a generalized state (s,c), J G(t) yields the (discounted) expectation 
of G(t) wrt the distribution of Traces(s, c). The intuition underlying C(F)(t) has been 
discussed in section 0] — at a finitely- varying function /, C(F)(t) yields the evaluation at 
time t of a time-smoothed variant of /. 

We formalize these intuitions below. The interpretations of F-function expressions and 
G-function expressions yield maps whose range is the interval [0, 1]. 

• The domain of F-function expressions is Q, the set of generalized states. 

• The domain of G-function expressions is the set of finitely-varying functions with 
range Q. 

Fix a GSMP. F-function expressions are evaluated as follows at a generalized state (s,c): 



p((s,c) 
l(( S ,c) 
min(F 1 ,F 2 )((s,c} 
hoF((s,c) 

G(t))((s,c) 



1, iff p true at s 
1 

min(F 1 (( S ,c)),F 2 (( S ,c})) 
h(F((s,c))) 

k x / G(i)(/)d/i 



where [i is the distribution of Traces((s, £)). Note that in this definition / varies among 
the paths of Traces((s, £)) so G(t) is a measurable function on the space of these paths and 
fi is a measure on these paths. 

3 max(-, •) is definable as 1 — min(l — •,! — •) in both classes of function expressions. 
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G-function expressions are evaluated as follows at a finitely- varying function /: 

C(F)(t)(f) = sup {F(f(t'))-\t'-t\} 

f 

mm(G 1 (t 1 ),G 2 (t 2 ))(f) = mm(G 1 (t 1 )(f),G 2 (t 2 )(f)) 

hoG(t)(f) = h(G(t)(f)) 

Thus, for / € Traces((s, £)), C(F)(t)(f) is the upper Lipschitz approximation to J 7 ^ o / 
evaluated at t. 

9.2. A pseudometric from function expressions. We define a pseudometric d k as fol- 
lows. 

Definition 9.2. 

4 ( (s, c) , < s ', d) ) = sup F J F k ( (s , c) ) - F k ( (a' , c') ) | . 

We proceed to show that the uniformities defined by these two metrics agree. Unlike 
the case with discrete time systems the metrics themselves do not agree: it is the uniformity 
that is common to the two of them. 

Theorem 9.3. The uniformity induced by d k coincides with the uniformity induced by 
mjF fe , the maximum fixed point of T k - 

Proof. We demonstrate that the identity function is a uniformly continuous isomorphism 
between Q equipped with the metrics d k and m^ k . 

Consider the identity function from domain with metric mjr k and range with metric d k - 
A mutual inductive proof shows that: 

• Every F- function-expression F satisfies: 

\F((c,s))-F((c',s'))\<m n ((c,s),(c',s')) 

• Every G-function-expression G satisfies 

\G(t)(f)-G(t)(g)\<2xJ(m rk )(f,g) 

The key case in this proof is the case for the F-function-expression J G(t). For this case, 
the induction on G and k < ^ yields that k x G(t) is 1-Lipschitz for metric J{m^ k ). So, by 
the definition of the Wasserstein metric, we get the required inductive result for f G(t). 

This shows that the identity function from domain with metric mjc fe and range with 
metric d k is uniformly continuous. 

We prove the converse below. We show that mjr k is dominated by d k - We use the fact 
that the closure ordinal of T k is to. Let tuq = T,mj + i = !F k {rn). We show by induction on 
i that each mi is dominated by d k - We proceed in the following two steps: 

• Let f,g be such that J(mi)(f,g) > e, where e > 0. We show that there is a 
G-function expression such that for some t, G(t)(g) = and G(t)(f) > e. 

If J(m,i)(f, g) = e + 7 where e,7 > 0, without loss of generality we can assume 
that there is a t such that: 

(VO \t - t'\ < e + 7 => m t (f(t),g(t')) > e + 7 

From finite-variance, we get a partition 

t = t - (e + 7),ti,t 2 , • • • ,t n = t + e + 7 

such that: 
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- Let g(U) = (s,c). Then: (Vi* < t < t i+1 )g{t) = (s,c- f c \t - U\). 

I H fi+l I 2 

From the assumption that rrii is dominated by dk , there are i 7 - function expressions 
Fx,...,F n such that - Fj(g(ti))\ > e + 1. Using lemma IQ1 we have (Vi* < 

t' < t i+1 )mjr k (g(ti) , g(t')) < and thus we deduce that \Fj(f(t)) - Fj(g(t'))\ > e, 
for all ti < t' < ti+i. Using min and ho, without loss of generality, we can assume 
that Fj(f(i)) > e and F j (g(t i )) = 0. Consider min(.Fi> . . . , F n ) and C(F)(t). It 
evaluates to on g and to a value > e on /. 
• Let (s,c), (s',c') be such that W(J(mi))((s, c), (s', c')) = e + 7, where 7 > 0. We 
show that there is a F-function expression such that F((s' , c')) = and F((s, c)) > e. 

Following the proof of lemma IQfl we get finite sets of traces L\,Li2 satisfying 
W( J(rrii))(Li, L2) > e + ^ and it suffices to prove the result for finite linear combi- 
nations. 

From the above item, for each pair of traces (one from L\ and the other from 
Lo), there are G-function expressions, tliEit are non-zero only on fi, . . . , f n and 
zero on /{,..., f' m and which yield arbitrarily close approximations to the distance 
between the pairs. The result now follows by considering max(G^). 

□ 

10. Conclusions 

We have given a pseudo-metric analogue of bisimulation for GSMPs. We have shown 
that this really depends on the underlying uniformity and that quantities of interest are 
continuous in this metric. We have given a coinduction principle and a logical character- 
ization reminiscent of previous work for weak bisimulation of a discrete time concurrent 
Markov chain. 

The previous approaches to bisimulation work well for CTMCs, precisely because of 
the fact that the distribution is memoryless; at any given instant the expected duration in 
a state and the transition probabilities only depend on the current state of the system, and 
thus one can define a bisimulation on the state space. In contrast, the problem of describing 
bisimulation for real-time processes that have general distributions, rather than memoryless 
distributions, has been vexing. In the present work, we have shifted emphasis to the gen- 
eralized states that incorporate time and not tried to define a bisimulation on the ordinary 
states. Because the generalized states embody the quantitative temporal information we 
have to work metrically] an attempt to define bisimulation directly would have fallen afoul 
of the approximate nature of the timing information. 

If we want to move to continuous state spaces and stochastic hybrid systems, the 
whole dynamical formalism has to be different: one can no longer think of paths as cadlag 
functions. We will have to use stochastic differential equations to describe the systems and 
the space of sample paths for the trajectories. That is a subject for future work and one that 
we have been heading towards from the inception of our work on LMPs _BD EP97l IDE P02 . 
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